Empirical Evidence for Hilberg’s Conjecture in Single-Author Texts
نویسنده
چکیده
Hilberg’s conjecture is a statement that the mutual information between two adjacent blocks of text in natural language scales as n , where n is the block length. Previously, this hypothesis has been linked to Herdan’s law on the levels of word frequency and of text semantics. Thus it is worth a direct empirical test. In the present paper, Hilberg’s conjecture is tested for a selection of English prose using the Lempel-Ziv algorithm. An upper bound for the exponent β is found to be 0.949.
منابع مشابه
Hilberg’s Conjecture — a Challenge for Machine Learning
We review three mathematical developments linked with Hilberg’s conjecture—a hypothesis about the power-law growth of entropy of texts in natural language, which sets up a challenge for machine learning. First, considerations concerning maximal repetition indicate that universal codes such as the Lempel-Ziv code may fail to efficiently compress sources that satisfy Hilberg’s conjecture. Second,...
متن کاملHilberg’s Conjecture: an Updated FAQ
This note is a brief introduction to theoretical and experimental results concerning Hilberg’s conjecture, a hypothesis about natural language. The aim of the text is to provide a short guide to the literature. 1 What is Hilberg’s conjecture? In the early days of information theory, Shannon (1951) published estimates of conditional entropy for printed English. A few decades later, Hilberg (1990...
متن کاملA New Universal Code Helps to Distinguish Natural Language from Random Texts
Using a new universal distribution called switch distribution, we reveal a prominent statistical difference between a text in natural language and its unigram version. For the text in natural language, the cross mutual information grows as a power law, whereas for the unigram text, it grows logarithmically. In this way, we corroborate Hilberg’s conjecture and disprove an alternative hypothesis ...
متن کاملA Preadapted Universal Switch Distribution for Testing Hilberg's Conjecture
Hilberg’s conjecture states that the mutual information between two adjacent long blocks of text in natural language grows like a power of the block length. The exponent in this hypothesis can be upper bounded using the pointwise mutual information computed for a carefully chosen code. The bound is the better, the lower the compression rate is but there is a requirement that the code be univers...
متن کاملA Preadapted Universal Switch Distribution for Testing Hilberg's Conjecture
Hilberg’s conjecture states that the mutual information between two adjacent long blocks of text in natural language grows like a power of the block length. The exponent in this hypothesis can be upper bounded using the pointwise mutual information computed for a carefully chosen code. The bound is the better, the lower the compression rate is but there is a requirement that the code be univers...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012